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Contract/Grant  Title:  derivative  free  optimization  of  complex  systems 

WITH  THE  USE  OF  STATISTICAL  MACHINE  LEARNING  MODELS  ContraCt/Grant  # 
FA9550-11 -1-0239 

Reporting  Period:  September  1  201 1  to  August  31  2014 

Overview:  This  project  focused  on  development  of  novel  derivative 
free  optimization  methods  that  rely  on  recent  techniques  and  models 
from  statistical  learning.  The  main  idea  of  these  methods  is  to  build 
local  models  of  the  objective  function  from  randomly  sampled  data 
points.  This  approach  has  many  benefits,  in  that  it  allows  us  to 
construct  fairly  accurate  models  with  relatively  small  number  of 
samples.  The  key  difference  with  the  deterministic  sampling 
approaches  is  that  these  accurate  models  are  constructed  with  some 
high  probability,  but  not  always.  Moreover,  it  is  not  known,  when 
these  models  are  accurate.  Only  the  probability  of  an  accurate  model 
occurring  is  known.  Under  these  conditions,  novel  convergence 
theory  needed  to  be  developed,  which  has  been  the  focus  of  our 
research.  Below  we  list  specific  contributions. 

Sparse  Hessian  models:  We  have  developed  theory  and 
implementation  for  recovering  sparse  Hessian  information  in 
derivative  free  optimization.  We  showed  that  quadratic  interpolation 
models  computed  by  partial  11-minimization  recover  the  Hessian 
sparsity  of  the  function  being  modeled.  Given  a  considerable  level  of 
sparsity  in  the  unknown  Hessian  of  the  function,  such  models  often 
achieve  the  accuracy  of  second  order  Taylor  ones  with  very  few 
random  sample  points.  In  particular,  to  construct  an  accurate  second 
order  model  of  an  n-dimensional  smooth  function,  in  general  ,  O(n^) 
local  sample  points  are  required.  We  have  shown  that  if  the  Hessian 
of  the  function  contains  s  nonzeros,  then  only  0((n+s)*log(n)'^) 
samples  are  needed.  Our  results  rely  on  compresses  sensing  theory 
and  analysis  of  structured  random  matrices. 

The  results  from  this  work  have  been  the  topic  of  an  invited 
semiplenary  at  International  Symposium  for  Mathematical 
Programming  in  2012  and  also  the  resulting  paper  was  awarded  the 
Informs  Optimization  Society  Best  Student  Paper  prize. 

Convergence  of  trust  region  methods  based  on  random  models. 

Due  to  the  randomness  in  the  sampling  process  the  accuracy  of  the 
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models  is  also  random,  hence  it  was  necessary  to  investigate 
convergence  properties  of  trust  region  methods  based  on  random 
models. 

Traditional  analysis  of  model  based  derivative  free  optimization 
methods  relies  on  the  worst-case  behavior  of  the  algorithmic  steps 
and  the  models  involved.  There  are  conditions  that  the  models  and 
the  iterates  have  to  satisfy  to  guarantee  convergence.  Such 
requirements  are  difficult  or  costly  to  satisfy  in  practice  and  are  often 
ignored  in  practical  implementations.  We  developed  a  probabilistic 
view  point  for  such  algorithms,  showing  that  convergence  still  holds 
even  if  some  properties  fail  with  some  small  enough  probability. 

In  fact,  using  martingale  theory  we  show  that  the  probability  of 
accurate  models  needs  to  be  simply  at  least  one  half.  This  approach 
is  first  of  its  kind  and  has  already  generated  follow-up  research  from 
other  people  in  the  field.  In  particular  there  are  extensions  to 
stochastic  setting  and  convergence  rates. 

We  also  developed  an  implementation  for  noisy  derivative  free 
optimization  for  various  conditions  on  the  noise.  We  tested 
regularized  models  (Ridge  regression  and  SVM  models)  in  the 
derivative  free  setting.  We  extensively  tested  our  software  on  a 
complex  problem  of  protein  alignment.  The  ridge  regression  models 
did  not  produce  a  noticeable  improvement  over  the  regular  regression 
models.  SVM  models  have  shown  improvements  in  some  cases.  The 
SVM  work  was  performed  with  a  visiting  PhD  student  from  a 
university  in  Brazil.  He  is  continuing  to  work  on  this  topic  for  his 
thesis. 


Convergence  of  trust  region  methods  for  stochastic  functions. 

The  ultimate  goal  of  this  project  is  development  and  analysis  of  trust- 
region  model-based  algorithm  for  solving  black-box  stochastic 
optimization  problems.  We  proposed  and  analyzed  a  trust  region 
framework,  which  utilizes  random  models  of  f(x).  It  also  relies  on 
(random,  noisy)  estimates  of  the  function  values  at  the  current  iterate 
to  gauge  the  progress  that  is  being  made.  The  convergence  analysis 
then  relies  on  requirements  that  these  models  and  these  estimates 
are  sufficiently  accurate  with  sufficiently  high  probability.  Beyond 
these  conditions,  no  assumptions  are  made  about  how  these  models 
and  estimates  are  generated.  In  the  case  when  the  estimates  are 
accurate  with  probability  one,  our  results  recover  the  convergence 
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results  for  deterministic  functions  based  on  random  sample  sets,  as 
described  in  in  the  previous  section.  Our  method  applies  to  both 
black-box  and  gradient  based  optimization  and  to  both  biased  and 
unbiased  noise.  Our  computational  results  show  great  advantage  of 
using  our  framework  over  existing  methods  under  different  noise 
models. 

We  also  developed  a  novel  general  approach  for  generating  the 
sufficiently  accurate  random  models  by  randomly  sampling  the 
objective  function  and  constructing  regression  models  (or  other 
statistical  learning  models)  based  on  these  samples.  Previously, 
models  of  stochastic  functions  relied  on  Monte-Carlo  type  (repeated) 
sampling  of  the  function  at  deterministically  selected  points.  Under 
these  conditions  it  was  possible  to  show  that  eventually  an  accurate 
model  is  obtained  with  increasingly  high  probability.  With  our  work  we 
are  able  to  show  that  accurate  models  can  be  obtained  by  a  much 
more  general  class  of  models  and  samples.  Moreover,  we  show  that 
is  not  necessary  for  the  probability  of  an  accurate  model  to  increase. 

It  can  remain  constant,  as  long  as  it  is  sufficiently  high.  Hence  the 
number  of  sample  points  that  are  required  does  not  need  to  grow  as 
rapidly  and  previously  used.  Our  computational  results  show  the 
benefits  of  using  random  models  and  estimates.  We  have  tested  our 
approach  on  the  (noisy)  protein  alignment  problem,  which  has  been 
one  of  our  focus  applications. 
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